5 research outputs found

    Explaining Image Classifiers with Multiscale Directional Image Representation

    Full text link
    Image classifiers are known to be difficult to interpret and therefore require explanation methods to understand their decisions. We present ShearletX, a novel mask explanation method for image classifiers based on the shearlet transform -- a multiscale directional image representation. Current mask explanation methods are regularized by smoothness constraints that protect against undesirable fine-grained explanation artifacts. However, the smoothness of a mask limits its ability to separate fine-detail patterns, that are relevant for the classifier, from nearby nuisance patterns, that do not affect the classifier. ShearletX solves this problem by avoiding smoothness regularization all together, replacing it by shearlet sparsity constraints. The resulting explanations consist of a few edges, textures, and smooth parts of the original image, that are the most relevant for the decision of the classifier. To support our method, we propose a mathematical definition for explanation artifacts and an information theoretic score to evaluate the quality of mask explanations. We demonstrate the superiority of ShearletX over previous mask based explanation methods using these new metrics, and present exemplary situations where separating fine-detail patterns allows explaining phenomena that were not explainable before

    SuperHF: Supervised Iterative Learning from Human Feedback

    Full text link
    While large language models demonstrate remarkable capabilities, they often present challenges in terms of safety, alignment with human values, and stability during training. Here, we focus on two prevalent methods used to align these models, Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). SFT is simple and robust, powering a host of open-source models, while RLHF is a more sophisticated method used in top-tier models like ChatGPT but also suffers from instability and susceptibility to reward hacking. We propose a novel approach, Supervised Iterative Learning from Human Feedback (SuperHF), which seeks to leverage the strengths of both methods. Our hypothesis is two-fold: that the reward model used in RLHF is critical for efficient data use and model generalization and that the use of Proximal Policy Optimization (PPO) in RLHF may not be necessary and could contribute to instability issues. SuperHF replaces PPO with a simple supervised loss and a Kullback-Leibler (KL) divergence prior. It creates its own training data by repeatedly sampling a batch of model outputs and filtering them through the reward model in an online learning regime. We then break down the reward optimization problem into three components: robustly optimizing the training rewards themselves, preventing reward hacking-exploitation of the reward model that degrades model performance-as measured by a novel METEOR similarity metric, and maintaining good performance on downstream evaluations. Our experimental results show SuperHF exceeds PPO-based RLHF on the training objective, easily and favorably trades off high reward with low reward hacking, improves downstream calibration, and performs the same on our GPT-4 based qualitative evaluation scheme all the while being significantly simpler to implement, highlighting SuperHF's potential as a competitive language model alignment technique.Comment: Accepted to the Socially Responsible Language Modelling Research (SoLaR) workshop at NeurIPS 202

    Negative body experience in women with early childhood trauma: associations with trauma severity and dissociation

    No full text
    Background: A crucial but often overlooked impact of early life exposure to trauma is its far-reaching effect on a person’s relationship with their body. Several domains of body experience may be negatively influenced or damaged as a result of early childhood trauma. Objective: The aim of this study was to investigate disturbances in three domains of body experience: body attitude, body satisfaction, and body awareness. Furthermore, associations between domains of body experience and severity of trauma symptoms as well as frequency of dissociation were evaluated. Method: Body attitude was measured with the Dresden Body Image Questionnaire, body satisfaction with the Body Cathexis Scale, and body awareness with the Somatic Awareness Questionnaire in 50 female patients with complex trauma and compared with scores in a non-clinical female sample (n = 216). Patients in the clinical sample also filled out the Davidson Trauma Scale and the Dissociation Experience Scale. Results: In all measured domains, body experience was severely affected in patients with early childhood trauma. Compared with scores in the non-clinical group, effect sizes in Cohen’s d were 2.7 for body attitude, 1.7 for body satisfaction, and 0.8 for body awareness. Associations between domains of body experience and severity of trauma symptoms were low, as were the associations with frequency of dissociative symptoms. Conclusions: Early childhood trauma in women is associated with impairments in self-reported body experience that warrant careful assessment in the treatment of women with psychiatric disorders
    corecore